Model Selection

Multi-Scenario Applicability

# Multi-Scenario Applicability

Nano Image Captioning

This is a lightweight image captioning model based on bert-tiny and vit-tiny, weighing only 40MB, with extremely fast inference speed on CPU.

Transformers English

Vitpose Plus Huge

ViTPose++ is a vision Transformer-based foundational model for human pose estimation, achieving an outstanding performance of 81.1 AP on the MS COCO keypoint test set.

Pose Estimation

A vision-language model based on VIT image encoder and distilled GPT-2 text decoder for image caption generation tasks

T5 Base Spellchecker

A spell checker built on the T5-Base transformer for detecting and correcting text spelling errors.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase